fix: optimize performance for platform-wide querying #23

scotwells · 2026-01-09T21:05:00Z

Summary

Optimize Clickhouse database schema for platform-wide and user-specific querying of multi-tenant audit log data.

Details

As we began running performance tests against the activity apiserver, we noticed that platform-wide queries were performing drastically worse than tenant-level queries. See datum-cloud/enhancements#536 (comment) for a comparison.

This was a result of our initial schema being designed to order data by tenant resulting in platform-wide queries scanning the entire data set instead of being able to skip over irrelevant rows.

Performance improving changes

Moved to daily partitions so that partitions are TTL'd at a finer granularity so older partitions are out sooner.
Removes unnecessary skip indexes on fields already present in the ordering of the data. Skip indexes won't provide much performance benefit if the ordering is used.
Adds skip indexes for fields used for common querying patterns to help skip over irrelevant rows.
Creates new projections that are designed to efficiently query audit logs across all tenants.
- The platform-wide query projection is designed to support platform administrators querying across all tenants. Queries will be most performant when they query by a specific api group and resource which will be the most common querying pattern for cross-tenant queries.
- Also introduced a query projection for user-specific queries to help platform administrators query for all audit logs related to a specific user.
Moved to using hour-bucketed timestamps as the primary key's leading column to improve data-locality, compression, and query performance.

I've modified the 001_initial_schema.sql migration instead of adding a new migration because this service has not been released yet.

Related changes

Removed stage from the schema and the querying interface since we're only collecting the ResponseComplete stage from the system.
Adjusted the apiserver to intelligently change the order by used when querying clickhouse to ensure projections are used based on the query being performed by the end-user.
Updated the performance tests have been updated to better reflect real-world querying behavior where the api group / resource are present in the queries.

Unrelated changes

Upgraded to the v1.9.0 version of our shared actions to resolve an issue with the wrong tag being injected to the kustomize builds.
Moves to using a ReplacingMergeTree database engine to ensure that all audit logs are unique. Removing duplicates is a background operation so users may see duplicates if a merge operation hasn't been performed. To help prevent duplicates, I adjusted the NATS configuration and Vector configuration to de-duplicate audit logs based on the audit ID. The audit ID is guaranteed to be unique since we only collect the ResponseComplete stage.

Performance test results

These performance tests were executed against a 3-node Clickhouse cluster with each node being allocated 8 CPU and 30Gi of RAM. The performance test ramps up RPS against the system querying 24 hours of audit logs. We're collecting around ~24M audit logs per day.

Previous Clickhouse schema

This shows a performance test that was run against the activity system that was focused on tenant-level querying. The graphs show that the activity api would struggle with a small number of platform-level queries (~4 RPS) and queries would immediately begin timing out.

New optimized Clickhouse schema

This performance test demonstrates the improvements that were made after the new schema was applied. We were able to top out at 96.5 RPS with a P99 response time of 2.5 seconds. No timeouts or errors occurred with the performance test.

Resources

Relates to datum-cloud/enhancements#536

As we began running performance tests against the activity apiserver, we noticed that platform-wide queries were performing drastically worse than tenant-level queries. This was a result of our initial schema being designed to order data by tenant resulting in platform-wide queries scanning the entire data set instead of being able to skip over irrelevant rows. This change makes several adjustments to the schema to improve querying performance of the clickhouse database. - Moved to daily partitions so that partitions can be TTL'd each day instead of only when the month is over. This should also ensure that queries only need to scan fewer partitions because all queries will be time-bound. - Removes unnecessary skip indexes on fields already present in the ordering of the data. Skip indexes won't provide much performance benefit if the ordering is used. - Moves to using a ReplacingMergeTree database engine to ensure that all audit logs are unique. Removing duplicates is a background operation so users _may_ see duplicates if a merge operation hasn't been performed. We'll mitigate this in the collection pipeline by putting guardrails in place to prevent duplicates from being sent to Clickhouse. - Adds indexes for fields used for common querying patterns to help skip over irrelevant rows. - Creates new projections that are designed to efficiently query audit logs across all tenants. The platform-wide query projection is designed to support platform administrators querying across all tenants. Queries will be most performant when they query by a specific api group and resource which will be the most common querying pattern for cross-tenant queries. Also introduces a query projection for user-specific queries to help platform administrators query for all audit logs related to a specific user. I've modified the 001_initial_schema.sql migration instead of adding a new migration because this service has not been released yet. I've also removed `stage` from the schema and the querying interface since we're only collecting the `ResponseComplete` stage from the system. The apiserver has also been adjusted to intelligently change the order by used when querying clickhouse to ensure projections are used based on the query being performed by the end-user. Lastly, the performance tests have been updated to better reflect real-world querying behavior where the api group / resource are present in the queries. See: https://clickhouse.com/docs/data-modeling/projections

We need to configure the merge behavior of projections since we swapped over to the replacing merge tree engine.

This change adjusts the NATS stream configuration to support a 10 minute de-duplication window. The NATS message ID has been set to the audit log ID since the ID will be unique across all audit logs.

Have to enable JetStream to take advantage of the message_id option.

scotwells · 2026-01-12T16:13:12Z

migrations/001_initial_schema.sql

+-- Primary key optimized for tenant-scoped queries
+-- Deduplication occurs on the full ORDER BY key during merges
+ORDER BY (timestamp, scope_type, scope_name, user, audit_id)


Talked with @drrev about this Clickhouse schema and he mentioned we should try adjusting this to use the following schema change and see if there's any additional performance gains.

ORDER BY (toStartOfHour(timestamp), scope_type, scope_name, user, audit_id, timestamp) PRIMARY KEY (toStartOfHour(timestamp), scope_type, scope_name, user, audit_id)

Maybe even use toStartOfMinute depending on volume of data.

I ran a performance test against an updated schema that uses the toStartOfHour() for the order by / primary key to group events into 1 hour buckets.

The performance test completed successfully with no errors.

We were able to top out at 96.5 RPS with a P99 response time of 2.5 seconds.

Pushed a change in a6ce224 that adopts the recommended schema changes and apiserver pagination implementation.

Thanks @drrev for the recommendation!

migrations/001_initial_schema.sql

drewr · 2026-01-12T16:41:36Z

How big was the dataset in the tests? For Clickhouse it seems like 60 RPS, especially for reads, is several orders of magnitude below what it should be, even with billions of documents (which I'm assuming we didn't have...).

scotwells · 2026-01-12T16:49:52Z

@drewr the system had around 77 million rows in it, spread across ~3 days. Most of the queries in the performance test would have been scanning the last 24 hours, so should be looking at ~25 million rows per query.

The current schema uses millisecond-precision timestamps as the primary key's leading column. For a multi-tenant audit logging system ingesting events from many Kubernetes control planes, this causes: - Poor data locality: Events from the same tenant arriving at slightly different times are scattered across granules - Suboptimal compression: Similar events aren't co-located, reducing compression effectiveness - Inefficient deduplication: Duplicate events within the same hour aren't consistently grouped - Higher storage costs: More granules and lower compression ratios increase disk usage Hour bucketing addresses these issues by clustering events from the same tenant/scope/user within hour boundaries, aligning with how audit logs are queried (typically 1h, 24h, or 7d time ranges).

ecv

PR v.large, approving on trust

## Summary This PR makes additional improvements to platform-wide querying performance, adjusts the clickhouse audit log schema to use the correct timestamp for the request, adds support for querying by the user's UID, and adjusts the user-scoped projection to use the user's UID value instead of the username. ## Details - **Filter by User's UID** - Filtering by UID can be valuable to filter down to a specific user using a stable identifier instead of an email which can be changed by the user. UIDs are also only in place for users of the platform. Internal components that authenticate with certificates do not have UIDs. This gives us a clean way of filtering out internal components from audit logs. - **Request Received Timestamp** - I swapped to using the `.requestReceivedTimestamp` field of the audit log to represent the audit log's timestamp since it's the timestamp when the request was received by the apiserver. The `.stageTimestamp` is used by the collection pipeline to calculate delays in the pipeline because the timestamp indicates when the audit log was generated by the apiserver. - **User UID for user scope** - I swapped to using the user's UID as the filtering / sorting column when querying the audit log system through the user scope since the UID is the stable identifier for the user and is the value that's provided in the user's extra information. - **Hourly timestamp buckets** - Updated all projections to use the same hourly time bucketing introduced in #23. --- Relates to datum-cloud/enhancements#536

scotwells added 2 commits January 9, 2026 14:56

chore: upgrade github actions version

c7d523e

scotwells requested review from JoseSzycho, OscarLlamas6, cc-datum, drewr, ecv and zachsmith1 January 9, 2026 21:05

scotwells mentioned this pull request Jan 9, 2026

Create a scalable activity system for collecting, storing, and querying audit logs datum-cloud/enhancements#536

Open

scotwells added 2 commits January 9, 2026 15:46

fix: adjust migration to set merge behavior

81d55c1

We need to configure the merge behavior of projections since we swapped over to the replacing merge tree engine.

feat: de-dupe audit logs in collection pipeline

5ca4bc1

This change adjusts the NATS stream configuration to support a 10 minute de-duplication window. The NATS message ID has been set to the audit log ID since the ID will be unique across all audit logs.

scotwells force-pushed the fix/improve-platform-wide-query-performance branch from e63a8e6 to 21ce2cf Compare January 9, 2026 22:36

fix: adjust vector message_id configuration

a5b914a

Have to enable JetStream to take advantage of the message_id option.

scotwells force-pushed the fix/improve-platform-wide-query-performance branch from 21ce2cf to a5b914a Compare January 9, 2026 22:40

chore: regenerate migrations

f7172c8

scotwells commented Jan 12, 2026

View reviewed changes

scotwells added 3 commits January 12, 2026 13:58

chore: remove stage from performance test filters

6f80f07

chore: remove stage field from api documentation

1d9bc79

ecv approved these changes Jan 13, 2026

View reviewed changes

scotwells merged commit 5f92a71 into main Jan 13, 2026
4 checks passed

scotwells deleted the fix/improve-platform-wide-query-performance branch January 13, 2026 19:14

scotwells mentioned this pull request Jan 15, 2026

feat: filter by user uid #26

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: optimize performance for platform-wide querying #23

fix: optimize performance for platform-wide querying #23

Uh oh!

scotwells commented Jan 9, 2026 •

edited

Loading

Uh oh!

scotwells Jan 12, 2026

Uh oh!

scotwells Jan 12, 2026

Uh oh!

Uh oh!

drewr commented Jan 12, 2026 •

edited

Loading

Uh oh!

scotwells commented Jan 12, 2026

Uh oh!

ecv left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: optimize performance for platform-wide querying #23

fix: optimize performance for platform-wide querying #23

Uh oh!

Conversation

scotwells commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Performance improving changes

Related changes

Unrelated changes

Performance test results

Previous Clickhouse schema

New optimized Clickhouse schema

Resources

Uh oh!

scotwells Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

scotwells Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drewr commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scotwells commented Jan 12, 2026

Uh oh!

ecv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

scotwells commented Jan 9, 2026 •

edited

Loading

drewr commented Jan 12, 2026 •

edited

Loading